The minimum entropy output of a quantum channel is locally additive 
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We show that the minimum von-Neumann entropy output of a quantum channel is locally addi- 
tive. Hasting's counterexample for the additivity conjecture, makes this result quite surprising. In 
particular, it indicates that the non-additivity of the minimum entropy output is a global effect of 
quantum channels. 
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I. INTRODUCTION 



One of the most fundamental questions in quantum information concerns with the amount of information that can 
be transmitted reliably through a quantum channel. Despite of the significant progress in recent years [l|, 0, [H, HI, 
0, HH, [3 H G3, H9rl2ll [H, [51 [H, as pointed out in [4], this question remained surprisingly wide open. The 
main reason for that is related to the additivity nature of the classical or quantum capacities of quantum channels to 
O i- transmit information (IHl . Recently, it was shown that both the Holevo expression for the classical capacity [lij] and 
1 1 „ ' the quantum capacity [27| are not additive in general. The additivity of the Holevo expression for the classical capacity 
was an open problem for more than a decade and was shown by Shor [26] to be equivalent to three other additivity 
conjectures; namely, the additivity of entanglement of formation, the strong super-additivity of entanglement of 
formation, and the additivity of the minimum entropy output of a quantum channel. 

In [l4[ Hastings gave a counterexample to the last of the above additivity conjectures and thereby proved that they 
are all false. Hastings counterexamples (see also [a) e xist in very high dimensions and an estimate of these extremely 
high dimensions can be found in [111 ]. Earlier, in [261 ] . Shor pointed out that if the additivity conjectures were true, 
perhaps the first step towards proving them would be to prove local additivity. We show here that this local additivity 
conjecture is indeed true, despite the existence of counterexamples to the original additivity conjectures. Our results 
therefore demonstrate that the counterexamples to the original additivity conjecture exhibit a global effect of quantum 
! channels. 

As we pointed out in Appendix B of [ioj . both the local and global additivity conjectures are false over the real 
numbers. This in turn implies that a straightforward argument involving just directional derivatives could not provide 
a proof of local additivity in the general complex case. Hence, to show local additivity we use strongly the complex 
structure. 

In quantum information theory, quantum channels are the natural generalizations of stochastic communication 
channels in classical information theory. They are described in terms of completely-positive trace preserving linear 
maps (CPT maps). A CPT map J\f : Hd iri —> i?d otlt takes the set of <ii n x di n Hermitian matrices Hd iTl to a subset of 
the set of all d ut X <^out Hermitian matrices Hd mt • Any finite dimensional quantum channel can be characterized in 
terms of a unitary embedding followed by a partial trace (the Stinespring dilation theorem): for any CPT map Af 
there exists an ancillary space of Hermitian matrices He such that 

Af(p)=TY E [U(p®\0) E (0\)U^ 

where p g Hd ln and U is a unitary matrix mapping states |^;)|0).e with £ Hd in to Hd out ® He- 
The minimum entropy output of a quantum channel Af is defined by 

S min (A0 = min S(N(p)) , 

pe-ffd in .+,i 
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where Hd in ,+,x C H^ is the set of all d- ln x d- ln positive semi-definite matrices with trace one (i.e. density matrices), 
and S(p) — — Tr (plogp) is the von-Neumann entropy. Since the von- Neumann entropy is concave it follows that the 
minimization can be taken over all rank one matrices p = \ip){il>\ in H ( i in ,+.i- 

For any such rank one density matrix p we can define a bipartite pure state \^S) = U\i/j)\0)e m the bipartite subspace 
JC = {\ i S?)\ \ip) G Hd ia }- We therefore find that the minimum entropy output of the channel AT can be expressed in 
terms of the entanglement of the bipartite subspace JC defined by 

E(JC) ee min E{\4>)) , 

|0)e/c, ||0||=i 

where E(\cj>}) ee S (Tr b(|</>)(0|)) is the entropy of entanglement. In [l3[ it was pointed out that E(fC) = unless 
dim/C < (rf ou t — 1) (dim He — 1). This claim follows directly from the fact that the number of (bipartite) states in an 
unextendible product basis is at least d ut + dim He — 1 [3| • 

With these notations, the non-additivity of the minimum entropy output of a quantum channel is equivalent to the 
existence of two subspaces JC ± C C" 1 ® C"' 1 and JC 2 C C™ 2 ® C™ 2 such that 

e(jc 1 ®jc 1 )<e(jc 1 ) + e(jc 2 ) . 

In what follows we will prove the local additivity of entanglement of subspaces, which is equivalent to the local 
additivity of the minimum entropy output. 

The rest of this paper is organized as follows. In section [TT] we find and simplify the first and second directional 
derivatives of the von-Neumann entropy of entanglement. In section Hill we prove our main result of local additivity 
which is stated in Theorem [5] for the non-singular case. In section HVl we prove Theorem [5] for the singular case. We 
end with a discussion in section \V\ 



II. LOCAL MINIMUM 



Let JC C C™<g>C m be a subspace of bipartite entangled states. Since the bipartite Hilbert space C n <g>C m is isomorphic 
to the Hilbert space of all n x m complex matrices C" xm , we can view any bipartite state \ip) AB = 53, j x ij\i)\j) m 
K as an n x m matrix x. The reduced density matrix of \ip) AB is then given by p r ee Tr B \^) AB {^\ = xx*, and the 
entropy of entanglement of \^p) AB is given by 

E{x) ee -Tr (xx* log xx*) . (1) 

In our notations, instead of using a dagger, we use x* to denote the hermitian conjugate of the matrix x. 

If x € JC is a local minimum of E in /C, then there exists a neighbourhood of x in K. such that x is the minimum in 
that neighbourhood. Any state in the neighbourhood of x can be written as ax + by, where a, b € C and y € K, is an 
orthogonal matrix to x; i.e. Tr (xy*) = 0. We also assume that the state is normalized so that \a\ 2 + \b\ 2 — 1. Now, 
since the function E(x) is independent on global phase, we can assume that a is a positive real number. We can also 
assume that b is real since we can absorb its phase into y (adding a phase to y will not change its orthogonality to 
x). Thus, any normalized state in the neighbourhood of x can be written as 

with Ti(xy*) = , 



where t ee b/a is a small real number and y is normalized (i.e. Tr (yy*) = 1). 
Definition 1. 

(a) A matrix x £ JC is said to be a critical point of E(x) in JC if 



„„,s d „ ( x + ty 
D v E(x) = —E —=—L 

v y ' dt VVTT^ 



= V y G x A 

t=o 



where the notation D y E(x) indicate that we are taking the directional derivative of E in the direction of y, and 

x C JC denotes the subspace of all the matrices y in JC for which Tr (xy*) = 0. 

(b) A matrix x G JC is said to be a non-degenerate local minimum of E(x) in JC if it is critical and 



d 2 „ / x + ty 



> Vye/, 

t=o 



were we also allow D 2 E(x) = +oo. Moreover, a critical x G JC is said to be degenerate if there exists at least one 
P 



direction y such that D 2 E(x) — 0. 
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In order to prove local additivity we will need to calculate the above directional derivatives. This can be done 
by expressing the logarithm as an integral [28[ (see also [22], HH). However, in this technique all the quantities are 
expressed by integrals, and some of these integral expressions do not lead to additivity in a transparent way, as the 
divided difference method does. We therefore apply below a new technique that is based on the divided difference (l6l . 
(6.1.17)]. One of the advantages of the divided difference approach, is that it enables one to calculate and express all 
directional derivatives explicitly with no integrals involved. Before introducing the divided difference approach, we 
will first discuss briefly the afhne parametrization. 

In our calculations we will assume that x is diagonal (or equivalently, the bipartite state x represents is given in its 
Schmidt form). This assumption follows from the singular value decomposition theorem; namely, we can always find 
unitary matrices u £ C nx " anc j v g (rymxm g^jj that uxv is an n x m diagonal matrix with non- negative real numbers 
(the singular values of x) on the diagonal. Since E(x) = E(uxv) we can assume without loss of generality that x is a 
diagonal matrix. 



A. The Affine Parametrization 



Up to second order in t we have 

_ (x + ty)(x* + ty*) 



P{t) = 



1 + t 



= {xx* + t(xy* + yx*) + t 2 yy*) (1 - t 2 ) 
= xx* + t{xy* + yx*) + t 2 (yy* — xx*) = p + tj + t , 



(2) 



where p — xx* , 70 = xy* + yx* , and 71 = yy* — xx*. Note that Trp = 1 and Tr7o = Tr7i = 0, where without loss 
of generality we assumed Tr (yy*) = 1 since we can absorb the normalization factor of y into t. We are interested in 
taking the first and second derivative of 



E 



{ x + ty 



S(p(t))^S(p + t l0 + t 2 11 ) 



In this section we assume that p = xx* is an n x n non-singular matrix. Denote 

a(t) = p + tjo . 
In the next proposition we relate S(p(t)) with S(a(t)). 
Proposition 1. Let p(t), &(t), p, 70 and 71 as above. Then 

S(p(t)) = S(a(t)) - t 2 Tr [ 7l logp] + 0(t 3 ) 



(3) 



Proof. Since p is non-singular, also p(t) and o~(t) are non-singular for small enough t. Thus, / — p(t) < I for small t. 
Using the Taylor expansion 



io gP (t) = i 0g [/ - (/ - ^ = - f; t 1 -^-^ 



n = l 



we get 



-Tr [p\ogp(t)] = J2 -Tr p(l- a(t) - i 2 7 i)" 



Expanding the term in the trace above up to second order in t gives 



Tr 



We therefore have 



p (I a(t) i 2 7i r = Tr [p (I a(t)) n ] + t 2 nTv [p(I - p) n ~' 7l \ + 0(t 3 ) 



-Tr [p log />(*)] - -Tr [ploga(t)] + t 2 £ Tr [p(I - p)^ 1 ^] + 0(t 3 ) 

71=1 
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Since p 1 = Y^Li(I ~ P) n 1 anc ^ Tr (71) = we conclude 

Tr [p\ogp{t)] = Tr [ploga(t)} + 0(t 3 ) . 

Thus, 

Tr[p(*)logp(t)] =Tr [cr(i) log cr(<)] + < 2 Tr [ 7l logp] + 0(t 3 ). 

This completes the proof. □ 

This simple relation between S(p(t)) and S(a(t)) is very useful since now we can focus on the Taylor expansion of 
the simpler function S(a(t)). 

B. The method of divided difference 

To calculate the first and second derivatives of S(a(t), we first evaluate the Taylor expansion of a complex valued 
function / : C — > C, which we later assume can be extended to act on n x n complex matrices. 

We will make use of the notion of the divided difference for /, which we refer the reader to [H (6.1.17)] for more 
details. The divided difference for a function / : C — > C, given a sequence of distinct complex points, m £ C, i — 
1, . . . , n, is defined for i — 0, 1 by 

A°/N (4) 

AV( ai ,a 2 ) ee A/(a 1; a 2 ) := /(Ql) ~ /( " 2) , (5) 

ai — c*2 

and defined inductively by 

Aif/ , A l - 1 f(a 1 ,...,a t . 1 ,a l )- A l - 1 f(a 1 ,...,a t - 1 ,a l+1 ) 

A f(a 1 ,...,a i ,a i+ i) = , (6) 

for i = 2, 3, . . . ,n. It is well known that A I /(ai, . . . ,cti, ai+i) is a symmetric function in ai, . . . , <2!i+i, e.g. [l6l . p'393]. 
For points that are not distinct it is defined by an appropriate limit. For example, for i^ywe have 

Af(x,x)=f(x) 

\x-y) [x-yy 
A 2 f(x,x,x) = ±f"(x). (8) 

Note that ([8|) can be obtained from ((7]) by setting ft, ee y — x — > and expending /(y) = f(x + h) = f(x) + hf'(x) + 
±h 2 f"(x)+0(h 3 ). 

Theorem 2. Let A = diag(ai, . . . , a n ) £ c™ xn be a diagonal square matrix, and B — [bij] £ c™ x ™ i, e a complex 
square matrix. Assume that f(x) : C — > C satisfy one of the following conditions: 

1. fix) is an analytic function in some domain T> C C which contains a\, . . . ,a n , and can be approximated 
uniformly in T> by polynomials. 

2. a\, . . . ,a n are in a real open interval (a, b) and f has two continuous derivatives in (a, b). 
Then 

f(A + tB) - f(A) + tL A (B) + t 2 Q A (B) + 0(t 3 ) (9) 

Here La ■ C" x ™ — > C™ x " is a linear operator, and Qb ■ C" x ™ — > C™ xrl is a quadratic homogeneous noncommutative 
polynomial in B. For i,j = 1, . . . ,n we have 

[L A {B)]ij = A/Ka,)^, = lM^l^A hij (10) 

n 

[Q A {B)]ij ^^A 2 f(a i ,a k ,a j )bikbkj. (11) 
fc=i 
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In particular 

n 

Tr (L A (B)) = '£f'(a j )b jj (12) 
Tt(Qa(-B)) - £ ^ffi'ffiW (13) 

l,J = l J/ 

Remark. The expansion above can be naturally generalized to higher than the second order, but for the purpose of 
this article, we will only need to expand f{A + tB) up to the second order in t. Moreover, for our purposes we will 
only need to assume that the a* are real and the condition 2 on / holds. We kept condition 1 on / in the theorem 
just to be a bit more general. 

Note that in all the expressions above, one must identify on = ctj with the limit otj — > ai. For example, the term 



2{ai - a>j) 2 



In particular, note that if B is diagonal, Eq. (|13[) gives the known second order term of the Taylor expansion. 

Proof. From the conditions on /, it is enough to prove the theorem assuming / is a polynomial. By linearity, it is 
enough to prove all the claims for f(x) = x m . Clearly, in the expension 

(A + tB) m = A m + tL A {B) + t 2 Q A {B) + 0(t 3 ) 

we must have 

L A (B) = J2 A p BA q , (14) 

0<p,Q, p-\-q=m—l 

Q A {B)= J2 A p BA q BA r , (15) 

Q^P-M-T* p-\-q-\-r=m—2 

where we expanded (A + tB) m up to first and second order in t. All that is left to show is that these matrices coincide 
with the ones defined in Eqs. (1101111) . 

Indeed, since A is diagonal, the matrix elements of the L A (B) in Eq. (fT4l) are given by 

a m — a m 

[L A (B)] l3 = £ o^apa = -! i-k, , 

1 — ' J ai — aj 

Q^P'Q' p+q—m — l 

which is equal to the exact same matrix elements given in Eq. (|10[) . 

In the same way, since A is diagonal, observe that the matrix elements of the Q A (B) in Eg. (1151) are given by 

n 

[Q A {B)] l3 = E E a Wk a j b ik b kj ■ 

k—1 0<p,Q,r. p+q+r— m — 2 

On the other hand, a straightforward calculation gives for f(x) = x m 

A 2 x m (a il a k , aj) = ^ a i a l a, j- 

Thus, the expressions in Eq. (fTTj) and Eq. (fT5|) for Q A (B) are the same. 
We now prove Eq. (JT3J) . Observe first that Eq. (JTTJ) yields 

n 

Tr (Q A (B)) = £ r 2 fia,.a,.a,)h,,h„ , (16) 

i,j=l 

where we have used the symmetry A 2 f(ai, aj, ai) — A 2 /(ai, a^, aj). Now, since bijbji is symmetric under an exchange 
between i and j, we can replace A 2 f(ai, ai, aj) in Eq. (fl"6)) with 

2 [A 2 f(a t ,a l ,a j ) + A 2 f(a :j ,aj,a l )] = -Af'(a h aj) , 

where for the last equality we used Eq. ([?])• This completes the proof. □ 
We now use the above theorem for the Taylor expansion of the function S(o~(t)) in the neighbourhood of t = 0. 
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C. The first and second derivatives of E(x) 

We first assume that p is non singular. The case where p is singular will be treated separately in section HVl 
Theorem 3. Let p = diagjpi, . . . ,p n } with pj > for j = 1, . . . , n. For this case, we get the following expressions: 

D l E{x) = is(p(t)) --Tr( 7o logp) 



t=o 



W = * 5W.))L - -2 (TV [7l logp] + g |( 70 



Remark. The condition for x € JC to be critical is D y E(x) = which is equivalent to Tr [(xy* + yx*) logxx*] = for 
ail y € K such that Tr (xy*) — 0. Moreover, if x is critical then we also have D\ E(x) = for all i/£i 1 C JC. Hence, 
if x is critical we must have 

Tr(xy* logxx*) = (18) 

for all y € x C JC. 

Proof. Theorem [2] implies that 

S(p + t l0 ) = S(p) + tL p ( lQ ) + t 2 Q p { l0 ) + 0(t 3 ). 
where L p and Q p are the following linear and quadratic forms 



L p ( 7 ) = £>'( Pi )( 7 o) 



n n 



n i \ \ " 9'iPi) - g'(Pj) t , , s 
QpM =Z.L 2(pi _ pj) (7ok(7o)^ , 

and <jf(t) = — i logi. Note that the expressions for £ p (7o) and Q p (7o) above are the traces of the analogous expressions 
given in theorem [51 since S(p) is defined as the trace of the matrix g(p) = —plogp. 
Since 70 is hermitian with zero trace, and g'(t) = — 1 — logi, we get 

L p(lo) = -Tr(7 logp) 

Qp(7o) = ~L 2 (p 3 - Pfc ) l(7 ° kfc| ' (19) 

Combining this with proposition [T] proves the theorem. □ 
In the following lemma, we rewrite the expression in Eq. (JTTJ) , which will be useful for the proof of local additivity. 



Lemma 4. Denote w = (y + y*)/2, and z — i(y — y*)/2. Denote also Tj k = \fpjjp~k, where \jPjYl=\ are the eigenvalues 
of p = xx* . Then, the expression in Eq. |j7| ) for D 2 E(x) can be rewritten as 

D 2 y E(x) = -2E(x) - Tr [(yy* + y* y) log p] - 2 ]T (\w jk \ 2 *(r jk ) + \z jk \H(-r jk )) , (20) 



where 



with the identification $(1) = 2. 



1 r 4- 1 

<E>(r) = -logr 2 , r6l, (21) 

2 r — 1 
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Proof. The expression in Eq. (fl7|) for DyE(x) involves the terms | (to ) jfe 1 2 - The matrix 70 = xy* + yx* = xy* + yx 
where x = diagj-y/pT, ■ • ■ , y/Pn}- Note that y* = w + iz and y = w — iz, where w and z are the Hermitian matrices 
defined in the lemma. Thus, 

70 = xw + vox + i(xz — zx) . 
In terms of the matrix elements Wj k and Zj k of w and z, we have 

(lo)jk = (VPk + VPj)w jk + i(y/Pj - VPk) z jk ■ 
The square of this expression can be written as 

|(7o)jfe| 2 = (y/Pj + VPk) 2 \Wjk\ 2 + {y/Pj - y/Pkf \Zjk\ 2 + i(Pj - Pk)(w* k Z jk - W 3k Z* k ) 

Moreover, expressing back w and z interms of y gives i(w* k Zj k — Wj k z* k ) = (\ykj\ 2 — \yjk\ 2 )/2- We can therefore write 

|(7o)j7=| 2 = (VPj +VPk) 2 \w jk \ 2 + (y/Pj-VPk) 2 \ z ik\ 2 + ^(Pj - Pk){\Vkj\ 2 ~ \Vjk\ 2 ) ■ 

Substituting this expression, and the value for ji = yy* — xx* , into Eq. (|17p gives 

, (n \\ (VH + *V , + (VPJ--M)' , + 1 , _ ,,1 



-\dIE(x) = JS(i)+Tr [„' logp]+^log I 



2{Pj-Pk) 2(pj-p k ) 



Note first that the term 

lE lo s (l^l 2 " M 2 ) = | Tr l(y*y-yy*)iog P } 



Moreover, denoting — yjpj/ph we get 

(VPJ+VP^) 2 ,„„ fft ^ _ (»Vk_+ ] I'" , j _ I r,i. + I 

r jfc 

Similarly, 



2{Pj-Pk) \PkJ 2r jk + l J 

With these notations we get 

-\D 2 y E{x) = E{x) + ^Tr l(yy* + y*y)\ogp} + J2(M 2 <S>(r 1 k) + \z ]k \M~r ]k )) 

This complete the proof. □ 
In the rest of the paper we will use the notations 

n 

M x (y) ee Y, (M 2 Hr 3k ) + \z jk \ 2 $(-r jk )) = Tr [w$+(w) + z*~{z)] 

3,fc=l 

T x (y) = -E(x) - ^Tr [(y* y + yy*) log xx*} . (22) 

where <I>p are self-adjoint linear operators defining in terms of the Hadamard product between the input matrix and 
the matrix with elements ^>(±rj k ). That is, 

[®pH] jk = ${±r jk )w jk ■ 

With these notations we get that DyE(x) > if and only if 

M x (y) < T x (y) . (23) 
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D. The complex structure and additional necessary condition 

If DyE(x) > for all y orthogonal to x, then Df y E(x) is also positive since iy is orthogonal to x. That is, 

M x {iy)<T x (iy) = T x {y) . (24) 
Therefore, we get from Eqs. (|23|24[) that if a; is a non-degenerate local minimum then 

\ (M x (y) + M x (iy)) = £ \y jk \ 2 ${r jk ) < T x (y) , 



where 



$(r) ■■=\{Hr) + H-r)) = \ T ^r— l 



1 



logr , 



(25) 



(26) 



with the identification $(±1) = 1. Let $ p be a self-adjoint linear operator defining in terms of the Hadamard product 
between the input matrix and the matrix with components <$>(rjk)- With this notation the necessary condition given 
in Eq. (|25|) can be written as 



Tr 



y*$ p (y) <T x (y). 



(27) 



A simple analysis of the function $ shows that $(r) > 1 with equality if and only if r = ±1. Thus, Eq. ([27|) also 
implies the following necessary condition on a local minimum: 



1 < T x (y) 



which can be written as 



where 



E(y) - E{x) > 1 - - [S(yy*\\xx*) + S(y*y\\xx*)} 



S(yy*\\xx*) = Ti(yy*logyy*) - Tr(yy* logxx* 



(28) 



is the relative entropy. Since S(yy*\\xx*) > with equality if and only if yy* — xx* , we always have S{yy*\\xx*) > 
for Tr(xy*) = 0. Nevertheless, it is possible that Tr(xy*) = and yet S(yy*\\xx*) < 1. In such cases Eq. (|2"5]) gives 
E(y) > E(x) which is consistent with the fact that a; is a local min. 



III. LOCAL ADDITIVITY 



We now state the main result of this paper. 

Theorem 5. Let x^- 1 ' and x^ be two non- degenerate local minima of E(x) in K.^ C C" lXmi and K.^ C C™ 2 *™ 2 , 
respectively. Then, x^'®x^ is a non- degenerate local minimum of E{x) inKy~'®KP"' . Moreover, if x' 1 ) is degenerate 
local minimum and x^ is non- degenerate local minimum, then x^- 1 ' <8> a;*- 2 - 1 is a degenerate local minimum. 

The theorem above implies, in particular, that if a^ 1 ' and a/ 2 ) are critical points of E(x) in /C*- 1 ' and respec- 
tively, then, a;' 1 ) ® x^ is a critical point of E(x) in K,^ (g> K^. This fact was observed in Q (see also [24|), and 
later was stated in [lfj. It follows from the linearity in y of the condition given in Eq. (ITS)) for critical points. More 
precisely, if x^ and x^ are critical points, then a^ 1 ) <£> x^ is also critical if (see Eq. (ITSl ) 



= Tr 



xW®xW 



y*\og (xMxW*®xW X W* 



Tr 



x^y^*\og(x^x^ 



Tr 



x( 2 y 2) *io g (x( 2 v 2 >*) 



for all y € (a; (1) ® x^)- 1 -, where y (1) * = Tr 2 [(I® x^)y*] and y (2) * = Tri[(a; (1) ® T)y*]. In the equation above we 
used the additivity of the logarithm function under tensor products. Moreover, since y G (x^ ® x^\\ we also have 
y^ € (x^) 1 - and j/ 2 ) € (x^) 1 -. Thus, if x^ and a:( 2 ) are critical points, x^- 1 ' ® x^ is also critical |29j . 

In the following subsection we provide one of the main ingredients for the local additivity of the von-Neumann 
entropy output of a quantum channel. 
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A. The Subadditivity of $;J 



Lemma 6. Let <I>, $ : K — > K be defined as in Eg. H21\) and Eq. (|26p. respectively. Then, for any r, s £ M f/ie following 
holds: 

<P(rs) < 3>(r) + $(s) (29) 
lurf/i equality if and only ifr = s. In the operator language of Eqs. the inequality (|29p can be expressed as 

*%. 9p b < V®/ fl + ^j-^ps = <V <E)I B + I A <E) <V . (30) 
where two operators satisfies 0\ < O2 if and only z/Tr [y*0\y\ < Tr [y*02y] for all y. 



Proof. We need to prove that 



This inequality is equivalent to 



T log r 2 s 2 ) < -g — - log r 2 + - — - 

rs — 1 r z — 1 s z — 1 



r 2 + l rs + l\, 2 /s 2 + l rs+l\. o 

log r 2 + -= log s 2 > 



r — 1 rs — 1 / V s 2 — 1 rs — 1 



which is equivalent to 



where 



s 



rs 



(/(r) - /(s)) > , (31) 



f( r ) = — — 7 log 
r z — 1 



r 2 



That is, we need to prove that f(r) > f(s) if (s — r)/(rs — 1) > and /(r) < f(s) if (s — r)/(rs — 1) < 0. From 
symmetry under exchange of r and s, both cases are equivalent, and therefore without lose of generality we assume 
(s — r)/(rs — 1) > O.This inequality is satisfied if (a) s > r and rs > 1 or (b) s < r and rs < 1. A simple analysis of 
the function f(r) shows that / is odd, and it is monotonically increasing for — 1 < r < 1 and monotonically decreasing 
for |r| > 1. Moreover, note that /(1/r) = /(r). 

Consider case (a): If s > r > 1 then f(r) > f(s) since / is monotonically decreasing in this region. In the same 
way if — 1 > s > r then /(r) > /(s). Another possibility in this case is that < r < 1 < 1/r < s. But since 
both r and 1/s are positive and smaller than 1, we get /(r) > /(1/s) = /(s), where we have used the fact that 
f(r) is monotonically increasing for |r| < 1. The last possibility in this case is that l/r>s>— l>r. For this 
last possibility both s and 1/r are negative numbers bigger than —1 and in this region / is monotonically increasing. 
Thus, f(r) = /(1/r) > f(s). 

Consider case (b): First note that if s < < r then /(s) < < /(r), and if — 1 < s < r < 1 then /(r) > f(s) 
since / is monotonically increasing in this region. Another possibility in this case is that s < 1 < r < 1/s. But since 
both r and 1/s are positive and bigger than 1, we get f(r) > /(1/s) = ,/(s), where we have used the fact that /(r) 
is monotonically decreasing for r > 1. Finally, the last possibility in this case is that l/r<s<— l<r. For this last 
possibility both s and 1/r are negative numbers smaller than —1 and in this region / is monotonically decreasing. 
Thus, f(r) = /(1/r) > /(s). 

In order to prove the equality conditions, we need to show that the expression in Eg. pip equals zero if and 
only if s = r. Before proceeding to prove that, we check the case r = 1/s. In this case, <&(rs) = $(1) = 2 and 
<l(s) = $(l/r) = $(r). That is, if r = 1/s then the equality in Eq. (gS]) holds if and only if <l(r) = 1. As pointed 
out earlier, $(r) = 1 if and only if r = ±1. We therefore conclude that if r = 1/s than the equality in Eq. (|29p holds 
if and only if r = s = ±1. Assume now rs / 1. In this case, the expression in Eq. (|31l) equals zero if and only if 
/(r) = /(s). However, a simple analysis of the function /(r) implies that /(r) = /(s) if and only if r = s or r = 1/s. 
Since we assumed rs ^ 1, we get that r = s. This completes the proof. □ 
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B . Proof of Theorem \S\ 



We can assume without loss of generality that rii = mi, — m2- This can always be done by adding zero rows 
or columns. However, in this part of the proof we also assume that both a;' 1 ) and a;*- 2 ' are non-singular. The singular 
case is treated separately in section HVl From the singular valued decomposition (see the argument below definition [T]) 
we can assume without loss of generality that x^ 1 ' — diagj^/pi, . . . , y/p ni } and x^ = diag{ v /gT, . . . , ^/g„ 2 }, where pi 
and qj are positive and Y^LiPi = 2j=i 1j = 1- 

We first assume that both a;*- 1 ) and a;' 2 -* are non-degenerate local minima. We need to show that D 2 E(x) > for 
all y £ x- 1 , where x = x^ ® a;' 2 '. The most general y € (a^ 1 - 1 ® a;^ 2 ^) can be written as 



y = cia; (1) (g) y (2) + c 2 y (1) ® x {2) + c 3 y' 



(32) 



where j/W g (ajW)- 1 -, ?/ 2 ) s (x' 2 ^)^, and y' € (a;' 1 ^)^ ® (a;' 2 )) 1 " are all normalized. The numbers Cj can be chosen 



to be real because we can absorb their phases in j/ 1 ), y^ 2 \ and ?/'. They also satisfy c 2 
normalized. 

Consider first the simple case where y = x^ 1 ' ® j/ 2 - 1 • In this case, 



1, so that y is 



E 



x + ty 

VTTt 2 



Efa.W 



,(2) 



fry 



(2) 



,(2) 



fry 



(2) 



VT+t 2 

Since x^ is a non-degenerate local minimum, we must have D 2 E(x) > 0. The case y = y^ ® a;( 2 ) is similar. 
Consider now the case in which y £ (%^) ® (a^ 2 )) . Using its Schmidt decomposition, we can write it as 

(!) ,«„,( 2 ) 



(33) 



y/ 



(34) 



where 



Tr [yj 1 ^*] = Tr fo^'] = « B , , 



(2)„,(2)* 1 



and ci are real numbers such that J2i c i = 1- 
By definition we have 

M,(») = Tr [^$+ 8pB (^) + z AB ^ pB (z AB ) 

where w AB = (y* + y)/2, z AB = i(y* - y)/2, p A = x^x^* and p B = x { ^x {2 >. 
Applying lemma [5] both to $ Ass B gives: 



M a (y) < Tr w ab $ p a® iB (w AB ) + w^Qja^b (w AU ) + z A »§ p A m B {z Aa ) + z Aa ^ 9pB {z AB ) 



,.AB\ 



AB 



AB \ 



(35) 



(36) 



..AB; 



.,AB-> 



= Tr 



y** P ^®/B (y) + y*$/^ 0p -B (y) 



where I and 7 s are the identity matrices in the respective spaces, and in the last equality we have used the definitions 



AB 



(y* + y)/ 2 and z 



AB 



Kv* ~ y)/^- Now, but substituting into the above equation we get 

m x ( v ) < $> 2 Tr U^Ay^ + y^AvP) 



where we have used the orthogonality relations in Eq. (|35|). Combining this with Eq. (|2T[) gives 

M x ( y ) < £ c f (r^oCvP) + i>>(yf } )) = r.(y) 



(37) 



where the last equality can be verified from the orthogonality relations given in Eq. (|35p. and the fact that 

logxa;* =logx (1) a; (1) *(g)/ B +/ A «)loga; (2) x (2) * . (38) 
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This completes the prool for y e (a^ 1 )) ® (a^ 2 ))^. 

Consider now the most general case where y€ i 1 has the form given in Eq. (|32p . Denote 



w 



. IB 



-(y* + y) = cxx (1) ® w (2) + c 2 w (1) 



(39) 



where «/ = (y'* + y')/2 and we have used 

\ (x«* ® y( 2 > + ar« ® yW) = x« ® 1 (y< 2 >* + = x« ® W ^ 

\ (y«* ® a^* + y« ® arW) = - (y«* + y«) ® = ® x^ . 

In the above equation we used the fact that x^ and x^ are square diagonal matrices with their singular values on 
the diagonal. We would like to substitute the expression in Eq. (|3l?| for w AB , into the expression for M x (y) given in 
Eq. (|3T>1) . By doing that we will get expressions with several cross terms. We argue that these cross terms vanish. To 
see that consider for example the cross term 



cic 3 Tr 



and recall that p A = x^x^* and p B = x^x^*. Since $ + a^ b is self-adjoint, the above expression can be written 



as 



CiC 3 Tr 



x^ ®w^^ pB (w') 



= cic 3 Tr 



'«<'^^(l (1, «w {a) ) 



= cic 3 Tr 



i/ (^®$+ fl («; (2) )) 



where in the last equality we used the identity ^a0 P b(x^ ®w^) — x^ 1 ' ® $ + b (w^ 2 ^). This identity follows from the 

definition of , when working with a basis in which x' 1 ) is diagonal. Now, since the partial trace Tr i[w' (x^ <S> B)] = 
for all matrices B, we have 



cic 3 Tr 



x (1 *> ®w (2 *><S>+ A0pB (w') 



. 



In the same way, we see that all the other cross terms vanish. Moreover, denote 



-(y* -y) = cix (1) ® z (2) + c 2 z (1) ® x (2) + c 3 z' , 



where z^ 2 \ and z' are defined similarly to m/ 1 -*, u/ 2 ), and w' . Substituting this expression for z AB in Eq. (|36j) 
will also lead to vanishing cross terms. To summarize, by substituting the above expressions for z AB and w AB in 
Eq. ([H]) we get 



M x (y) = clM x (x W ® y (2) ) + c^M^yW ® x (2) ) + c^M x (y') 

However, since we already proved that a; is a non-degenerate local minimum in the directions x^- 1 ' ® y^ 2 \ y*- 1 -* ® x^ 2 \ 
and y', we get 

M x (y) < c 2 T x (x^ ® y( 2 ') + c^yM ® x^) + c 2 T x {y') (40) 

Now, note the orthogonality relations in the partial traces: Tri^x 1 - 1 ' ® 2/ 2 ^)(y')*] = T^^x 1 - 1 ) ® y^)(y')*] = an d 
Tri[(yW ®x< 2 ))(y')*] = Tr 2 [(y (1) ®x< 2 ))(y')*] = 0. With these relations and from Eq. ([38]) we get that the expression 
in the RHS of Eq. (|4"0"|) is equal to T x (y). This completes the proof of the main part of the theorem. 

To prove the second part of the theorem, assume that a;*- 1 - 1 is degenerate local minimum and a;*- 2 -* is a non-degenerate 

local minimum. Following the exact same lines of the proof above we get that M x {y') < T x (y') for y' £ (a;' 1 )) ® 

(a;' 2 )) 1 '. This is clear from Eq. (|37| and the one above it, where we use the fact that 



Tr 



(2)*S / (2) J „ , (2)s 

vl ) < T ^w ) 



since a:*- 2 -* is a non-degenerate local minimum. Similarly, if y = x^ ® y*- 2 ) we get M x (y) < T x (y). The only y € x 1 - 
for which it is possible to have M x (y) = T x (y) is y = j/W ® x^ 2 - 1 . However, in this case 



E 



x + ty 

VTT7 2 



= £ 



r (i) 



(i) 



+ S(a:( 2 )) , 



(41) 



so x is a local minimum in this direction as well. Hence, x is a degenerate local minimum. This completes the proof 
of the second part of the theorem. 
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IV. THE SINGULAR CASE 



In the previous section, we were able to derive the first and second directional derivatives DyE(x) and DyE(x) 
assuming x is non-singular. In this section we consider the case where x is singular. While the expression for DyE(x) 
is the same as in the previous section, the expression for the second derivative is not the same for the singular case. 
In particular, in the singular case it is possible that D y E(x) = ^sS{p(t))\ Q diverge. Nevertheless, we will see in 
this section that even if x is singular, E(x) is additive. 

For simplicity of the exposition, we will consider here subspaces JC C C n ® C m , where n = m, since we can always 
embed JC in c max {™> m } ® C™'"'" 1 '. The following theorem provides the criterion for the divergence of the second 
derivative. 

Theorem 7. Let ijG/Cc C nxn , Tr xx* = Tryy* = 1 and Tr (xy*) = 0. Change the standard orthonormal base in 
C™ to a new orthonormal base such that x and y have the forms 



Or.n— r 
On — r,r On— r,n—r 



and y 



where r is the rank of x, Oij are i x j zero matrices, and X\i,yn € 



2/n 2/12 

2/21 2/22 

■ r . Then 



S(p(t)) = /(f) - (K + tg(t))t 2 logf 2 , K = Tr (2/222/22), 



(42) 



(43) 



where f{t),g(t) are analytic functions in a neighbourhood of 0. Hence D 2 E{x) — +00 if and only if j/22 ^ 0. 
Furthermore, if y22 — then either g(t) = or git) = at 2k ^ 1 (l + 0(t)), where a > and k is a positive integer. 

A much weaker version of the theorem above can be found in [Tol j . For the clarity of the exposition in this section, 
we leave the proof of Theorem [7] to appendix [X] 

From the theorem above it follows that w.l.o.g we can set j/22 = since otherwise the second derivative is +00. 
This will be useful when proving local additivity for the singular case. However, in the tensor product space, y can 
be written as in Eq.( [34)) . Hence, while we assume that the (2,2) block of the bipartite state y is zero, it is not 
immediately obvious that the (2,2) blocks of the one-party states j/j and yf^ are also zero. Nevertheless, this is 
indeed the case as we show now. 



A. Tensor product structure in the singular case 
Let K. C C" xrl be a subspace of matrices that are partitioned as in Eq. (|4"2")l . We assume that JC contains a matrix 

, Tr (a^sii) = 1. (44) 



xu 




We now choose a following orthonormal base x%, . . . , x p , y%, ■ ■ ■ , y q , Zi, • • • , z r , wi, ■ ■ ■ , w s £ JC. First, x% = x. Then 



1. Xi,...,x p is an orthonormal basis of the subspace of JC of matrices of the form 
p=l.) 



* 




(It is possible that 



2. xi, . . . , Xp, 2/1, . . . , y q is an orthonormal basis of the subspace of JC of matrices of the form 
that q = 0.) 







. (It is possible 



3. Xi, . . . , x p , yi, . . . , y q , z\, ■ . . , z T is an orthonormal basis of the subspace of JC of matrices of the form 
is possible that r = 0.) 

4. xi, ... , x p , y\, . . . , y q , z\, . . . , z r , w\, . . . , w s is an orthonormal basis of JC. (It is possible that s = 0.) 
We observe the following 

1. The projections of x%, . . . ,x p on the block (1, 1) are linearly independent. 

2. The projections of y±, . . . , y q on the block (1, 2) are linearly independent if q > 1. 



* 



(It 
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3. The projections of zt, . . . , z r on the block (2, 1) are linearly independent if r > 1. 

4. The projections of Wi, . . . , w s on the block (2, 2) are linearly independent if s > 1. 

We now consider two subspaces Ki C C" iXni for i = 1,2. We consider here the most complicated case in which 
both matrices x^ G K\ and x^ G K.2 are singular. So we assume that each x^ has the form (T4"4")) . For i = 1,2 we 
form orthonormal bases 



(*) 



0) U W 
pi > » l ' 



exactly as above. We now form a tensor product of /Ci <g) /C2 with respect to the partitions of K±,K2 as above. 
Let 



A = 



An A u 
A21 A 2 2 



G K-i, B = 



B21 B 2 2 



G/C 2 



(45) 



(46) 



We then agree that the partition in K\ ® K2 is of the form as the following partition of A ® _B: 



AOS = 



An 
A21 



B21 An 
A21 

B2I A21 



B12 A 12 

B22 A\2 

B\2 A22 

B21 A22 



Bn A u 

B2I A\2 

B\\ A22 

B21 A22 



B 12 
B21 
B 12 
B22 



(47) 



Lemma 8. Let Ki,K2 be two subspaces in C™ 1 *™ 1 and C™ 2 *™ 2 , respectively. Let C = j =1 G Kx (8 K.2 be 

partitioned as in (|47l) . Suppose that C ^ and Cy- = for i,j > 2. Write C as a linear combination of the tensor 
products of the bases of K\ and K.2, chosen as in (|45[) . TTien eac/i term in this linear combination of C is of the form 



af <g> g, where a G C, / € K\, g € K2, and both f and g have the form 



* 



* * 




' * " 





or 


* 



Remark. It is also possible to show that at least one of the matrices / and g must have the form 
However, we will not be using it here. 

Proof. Suppose the expansion of C contains a term of the form w^' ® Wj . Look at the block (4, 4). The contribution 

of the expansion of C to this block only comes from the tensor products projections of and wj 2 ' on the block 
(2, 2). Since all these projections are linearly independent we must have that C44 ^ contrary to our assumption. 



Assume now that the expansion of C contains w 



(i), 



) zf^ . Since the expansion of C does not have terms u> - ^ ® vJf^ 



the contribution to the block C43 comes only from the projection of on the block (2, 2) and the projection of 
on the block (2, 1). Again as all these projections are linearly independent we deduce that C43 7^ 0, contrary to our 
assumptions. 

Similarly, there are no terms in the expansion of C of the form (S> Uj 2 \ since C34 = 0, and there are no terms in 



the expansion of C of the form ®xy since C33 = 0. That is, we have shown that the matrices w\ L> do not appear 

tactly the same way, there are no terms in the expansion of C of the form eg) uif^i 
since C42 = 0, C24 = 0, and C22 = 0, respectively. This completes the proof. □ 



.,(!) 



(2) 
3 3 ' 



Dj ' <X> wf , and X; <g> w 



(2) 
3 ' 



B. Local additivity in the singular case 

In this subsection we prove Theorem [5] for the case in which x^ and x^ are singular local minima of K {1) and 
K^ 2 \ respectively. We therefore choose bases such that a^ 1 ) and x^ are of the form given in Eq. (|42p . and denote by 
r\ and r 2 their respective ranks. 

Assume first that both x^ and x^ are non- degenerate local minima. We need to show that D 2 E{x) > for all 
y € x 1 -, where x = x^ ®x^ 2 \ Note that the partition of x — [xij]fj =1 as in Eq. (|47|) gives Xij — for all i,j — 1, 2, 3, 4 
except for x\\ = x^ x^ . 

The most general y G (x^ <g> x^) can be written as in Eq. (|32]l. where y' is of the form given in Eq. (|34[) . 
Consider now the partition of y = [yij]j = j =1 as in Eq.([4"7|). From Theorem [7] we know that D 2 E{x) — +00 unless 
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Uij = for all i,j = 2,3,4. We therefore assume now that y^ = for all i,j = 2,3,4. In this case, Lemma [8] implies 
that all the matrices yj 1 ^ and y\ in Eq. ([34]) have the form 





That is, their (2,2) block is zero. For this reason, we replace each subspace KS l > C C™ iXrii (i = 1,2) with a smaller 
subspace U^ 1 ' C Ky> such that each matrix in the basis of U^ 1 has zeros on the (2,2) block. It is left to prove that 
x = x\ <g> X2 is local minimum in ® U^ 2 \ 

Consider the new subspace £4 , for e > 0, where in the orthonormal basis of U^ l \ we change only the first matrix 
iW, i.e. the local minimum matrix, with the normalized diagonal matrix 



-CO 



X 



Or 



0n^ — ri , 

n) x (rii 



1,2 



Ti) identity matrices. 



where Ojj are i x j zero matrices and Im-ri are (n^ 
Lemma 9. Assume a;W is a non-degenerate local minimum in U (l \ then xi^ is a non-degenerate local minimum in 
I4e^ ■ Moreover, there exists 6 > and eo > such that if e < eo then D 2 {i) E(xi l ' > ) > 5 for all j/W g (^e^J ■ 

Proof. For simplicity of the exposition we remove the superscript (i) from xW and denote d = n — r. That is, consider 



sen ri d 



and x f = 



1 



We need to show that if a: is a non-degenerate local minimum in U then x e is a non-degenerate local minimum in U e 
for small enough e. 

First, we need to show that x e remains critical. Indeed, since the condition (|18p for criticality is satisfied for x, it 
is also satisfied for x e . This is because x e is a diagonal matrix and all y G x^r C U is of the form 



* 



d.d 



Second, we need to show that D 2 E(x c ) > 5. In Appendix IB1 we show that D 2 E(x c ) does not diverge in the limit 
e — > (assuming yjk = when both j > r and k > r). Now, since we assume D 2 E(x) > for all y G x^, we can also 
assume that there exist 6' > such that D 2 E(x) > 5' for all jti 1 . This is true because the set of all normalized 
matrices in i is compact. Hence, from the nice behaviour of D 2 E(x e ) in the limit e — > (see Appendix [B"jl . we get 
that for small enough e there exists 5 > such that D 2 E(x e ) > 5 for all y G (xe)^. This completes the proof of the 
lemma. □ 



We now apply Theorem [5] to the non-singular case of x e 

W 



From Lemma [SJ the second derivatives 
D 2 m E(xi i) ) > 5 for all yW G (i^)" 1 " and i = 1, 2. Thus, we get that D 2 y E( Xf ) > 25 for all y G (x^. We obtain it by 
following precisely the same steps of the proof of Theorem [5] (in the non-singular case). Letting e — !• we deduce that 
in the direction of y the second derivative at x^ ® x*- 2 -* is strictly positive (greater or equal to 25). This complete the 
proof of the main part of theorem [5] for the singular case. 

To proof the second part of the theorem, we assume now that a^ 1 -* is degenerate local minimum and x^ is non- 
degenerate local minimum. In this case we only have D^ 2) E(xe) > 5. Nevertheless, in Appendix IB1 we show that 

D 2 {1) E(xP) - D 2 w E(x^) is of order eloge 2 . Therefore, since D 2 (1) E(x^) > 0, it follows that we can choose e 
small enough such that D 2 w E(xP) > -5/2. 

As pointed out in the proof of the non-singular case of theorem [SJ the only y G x 1 - (recall x = x^ ® x^) for 
which it is possible to have D 2 E(x) = is y = yW ® x^ 2 K However, the equality in Eq. (|4"Tj) implies that x is a local 

minimum in this direction and this is also true even if x^ are singular. We will therefore assume now that y is not 
of the form yW ® x^ . By following precisely the same steps of the proof of Theorem [5] (in the non-singular case) we 
get that for all other y G x 1 - we have D 2 E(x e ) > 5 — 5/2 = 5/2. We therefore get D 2 E(x) > in the limit e — s- 0. 
This completes the proof of the second part of theorem O 
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V. DISCUSSION 



We have shown that the minimum entropy output of a quantum channel is locally additive (assuming at least one 
of the two local minima is non-degenerate). Our proof consists of two key ingredients. The first one is the use of 
the divided difference approach, which enabled us to calculate directional derivatives explicitly, and the second one 
is the explicit use of the complex structure. In the appendix B of [l(| we show that there exists counterexamples 
for local additivity over the real numbers. These counterexamples precludes the existence of a more straightforward 
differentiation argument than the complex structure based argument given here. 

The fact that the minimum entropy output is not globally additive makes local additivity of even greater interest 
to quantum information theorists. It suggests that it is some global feature, of the quantum channels involved, that 
corresponds to cases of non-additivity of the minimum entropy output. Perhaps one way to improve our understanding 
in this direction is to study properties of generic channels. In particular, it seems quite possible to us that for generic 
channels (or generic subspaces) the entropy output have a finite number of isolated non-degenerate critical points. 

Acknowledgments: — We acknowledge many fruitful discussions with A. Roy and J. Yard in the earlier stages of 
this work. GG research is supported by NSERC. The authors acknowledge support from PIMS CRG MQI, MITACS, 
and iCore for Shmuel Friedland's visits to IQIS in Calgary. 



Appendix A: Proof of Theorem Ffl 



Proof. Let Xi(t) > . . . > X n (t) > 0, for t > 0, be the eigenvalues of p(t). Rellich's theorem yields that each Xi(t) is 
analytic in t in a neighbourhood of t = 0. So Xi(0) = Xi(p) > for i = 1, . . . , r and A;(0) = for i = r + 1, . . . , n. 
Since each Xi(t) > it follows that the Taylor expansion of each Xi(t) ^ 0, for i > r, must start with t to a positive 
even power times a positive constant. I.e. Xi(t) = Xi, 2ni t ni (l + 0(t)), where Xi i2ni > and ni is a positive integer 
for i > r. This shows that S(p(t)) = — Yn=i ^*(*) ^°sXi(t) must be of the form Furthermore, K = if and only 
if ni > 2 for all i > r. So if K — and not all Xi(t) are identically zero for i > r, then k = min{n,; — 1, Xi_ 2ni > 0}. 

Recall that the pencil X + tY has n 



Mt) 2 
TP 



' x~ 


,Y = 


' 


v 


x* 


y* 






It is left to show that K = Tr (2/222/22)- Let, X 
nonnegaive and n nonpositive eigenvalues 

cri(t) > ... > a n (t) > > -a n {t) > 



> -o-i(t). 



The singular values of x + ty are the n nonnegative eigenvalues of X + tY. Hence, the eigenvalues of p(t) are -j 
for i = 1, . 



, n. Let Oi(t) = <Ti it + 0(t 2 ) for t > and i > r. Hence the coefficient of t 2 in the i-th eigenvalue of p{t), 
for i > r, is a 2 \. Thus K = Ya=t+\ °f,i- 



Let P € C 2nx2n be the orthogonal projection on the zero eigenspace of X. Then PYP((I — P)C 2n ) = 0. The 
other possible nonzero eigenvalues of PYP are ay+1,1 > . . . > o~ n 1 > > — o~ n< i > ... > — ay+1.1, which are the 
eigenvalues of the restriction of PYP on the kernel of p [1, [l8| or 0, §3.8]. The restriction of PYP to the kernel of 
2/22 



X is 



2/2*2 



obtained by deleting the corresponding rows and the columns in Y. Hence 

n 

V ah - Tr ((PYP) 2 ) = Tr (y 22 y* 2 + y* 2 y 22 ). 



2K 



i— r+1 



This completes the proof. 

Appendix B: Formula for the second derivative in the singular case 



□ 



Proposition 10. Let 



•Ell 0r,n— r 



where O^j are i x j zero matrices, and Xn,yn G C r 



2/12 G 



2/11 2/12 

2/21 On— r,n— r 
j 1/21 c L 



(Bl) 



We also assume that X\\ 



diagj^/pi, y^v} is non singular. Then, the limit of D 2 E(x e ) when e goes to zero exists and equals to 

lire iD 2 y E(x € ) = D 2 yil E(x n ) - 2Tr [(y 12 y* 12 + y* 21 y 21 ) log p u ] , 



(B2) 
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where p\\ = x\\x\ x . 

Remark. The contribution of the normalization factor of x e is of order 0(e 2 ) and therefore ignored here. 

Proof. The proof is based on a straightforward calculation. The expression for the second derivative given in Eq. (|17|) , 
can be written as: 

d 2 ( " " 

D 2 y E(x e ) = ^S(p e (t)) = -2 S(p e ) + E E G ^ 

where p e = x e x*, 70 = x e y* + yx* 

n -u |2i , log Pj - logPfc 1/ ^ ,2 m „s 

Gjfc = Ij/jfel logpj H -t 1 ^ — l(To)jfc| ■ (B3) 

l \Pj -Pk) 

If both j, k are smaller or equal to r, then clearly those Gjk terms contribute to D yii E(xn). Now, if both j > r 
and k > r then j/^ = and we have Gjk = 0. Hence, we get 



n r 



D 2 y E(x e ) = D 2 yii E( Xll ) - 2 E E + G «) ( B4 ) 

j—r+l k=l 

We therefore focus now on the expressions for Gjk and Gkj in the case j > r and k < r. 
Writing x e — diagly/pi, y/p^} with pj = e 2 for j > r we have 

(lo)jk = VPjVkj + VPkVjk = VPkVjk + 0(e) 
(lo)kj = VPkVjk + \fPjVkj = s/PkVjk + 0(e) , 

where the last equality was obtained by setting pj — e 2 . We therefore have |(7o)jfc| 2 = |(7o)fcj| 2 up to O(e). From the 
expressions above we get for j > r and k < r the following formulas: 

G ik = M 2 \ogp, + l ° g ^ ~ l0g f (p k \y jk \ 2 + 0(e)) 

*(Pj - Pk) 

G kj = \y kj \ 2 log Pk + lQ y fc ~ l0g f (Pk\y ]k \ 2 + 0(e)) 

£\Pk Pj) 

Since pj = e 2 and p k > 0, we have 

G jk = \y jk \ 2 loge 2 + l °lf e2 Z l0 p ^ k (Pk\y,k\ 2 + 0(e)) = ^floge 2 + ^Irflogp* + 0(e\oge) 

Gkj = \ykj\ 2 logPk + Z"^~ {Pk\V]k\ 2 + 0(e)) = \yk 3 \ 2 \ogp k + ^\yjk\ 2 \ogp k - ^\yjk\ 2 loge 2 + O(eloge) 

Hence, 

Gjk + G k] = \yjk\ 2 \ogp k + \y k j\ 2 \ogpk + O(eloge) . 
By substituting this expression into Eq. (|B4|) we get (|F32|) . This completes the proof. □ 
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